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Abstract 

Deterministic computer simulations are often used as a replacement for com- 
plex physical experiments. Although less expensive than physical experi- 
mentation, computer codes can still be time-consuming to run. An effective 
strategy for exploring the response surface of the deterministic simulator is 
the use of an approximation to the computer code, such as a Gaussian pro- 
cess (GP) model, coupled with a sequential sampling strategy for choosing 
design points that can be used to build the GP model. The ultimate goal 
of such studies is often the estimation of specific features of interest of the 
simulator output, such as the maximum, minimum, or a level set (contour). 
Before approximating such features with the GP model, sufficient runs of the 
computer simulator must be completed. 

Sequential designs with an expected improvement (EI) function can yield 
good estimates of the features with a minimal number of runs. The challenge 
is that the expected improvement function itself is often multimodal and 
difficult to maximize. We develop branch and bound algorithms for efficiently 
maximizing the EI function in specific problems, including the simultaneous 
estimation of a minimum and a maximum, and in the estimation of a contour. 
These branch and bound algorithms outperform other optimization strategies 
such as genetic algorithms, and over a number of sequential design steps can 
lead to dramatically superior accuracy in estimation of features of interest. 
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1. Introduction 



Computer experiments are often employed to simulate complex systems, 
but realistic computer simulators of physical phenomena can be time con- 
suming to evaluate. In such cases it is desirable to perform the experiment 
as efficiently as possible. One popular approach designed for this purpose 
is sequential experimental design (Jones, 2001) whereby an initial design is 
simulated and a surrogate model is fit to the output, often using a Gaussian 
Spatial Process (Sacks et al. , 1989). The surrogate provides estimates at 



all input combinations (sampled and unsampled) with accompanying vari- 
ance estimates and is used, along with a suitable 'infill criterion', to choose 
subsequent trials. 

Scientists are often interested in specific features of the simulator, such as 
the maximum, the minimum, level set(s), etc. In each case, the follow-up trial 
in the sequential design is selected so as to give the greatest improvement to 
the estimate(s) of these feature(s) of interest. A popular approach is to choose 
this new trial by maximizing a merit-based infill criterion. Many criteria have 



been proposed (Jones et al. , 1998) ( Villemonteix et al. , 2006) (Forrester and 



Jones 2008) and one popular choice is the 'expected improvement'. This 



criteria has been shown to be efficient if the initial design is not too sparse 
or deceptive (Forrester and Jones, 2008). 

The form of the improvement function varies with the features of inter- 
est, but in any case facilitates a balance between choosing points that exhibit 
local optimality (greedy search, new trials close to predicted feature of inter- 
est), and points with high uncertainty (global search, new trials away from 
previously sampled points) (Papalambros et al. , 2002). Finding the next 



design point (the point that maximizes the EI function) is necessary in or- 
der to ensure the estimation of features of interest in the smallest number 
of simulator runs. Finding this maximum is a difficult global optimization 
problem, and the addition of each point in the sequential design changes the 
EI function, requiring another search for the global optimum of EI in order 
to find the next point. 

The main focus of this paper is to develop branch and bound algo- 
rithms for maximizing expected improvement functions for two particular 
features of interest; contours, and the simultaneous estimation of maximum 
and minimum. The most challenging aspect of this algorithm is construct- 
ing the bounds on the objective function (expected improvement). Branch 
and bound algorithms have been implemented for finding the minimum of a 
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function (Jones et al. , 1998), and expected improvement criteria have been 



developed for the minimum (Jones et aL |1998 ), and contours (Ranjan et al. 
2008). Our main contribution are in (a) generalizing the expected improve 



ment criterion for simultaneous estimation of the maximum and minimum, 
and (b) developing branch and bound algorithms for maximizing expected 
improvement for contours, and for simultaneous maximum/minimum estima- 
tion. Ranjan et al. (2008) maximized their expected improvement function 



for estimating contours using a genetic algorithm. The branch and bound 
algorithm developed here outperforms their optimization by locating level 
sets using fewer simulator runs. 

This paper is organized as follows: Section [2] gives an overview of the 
Gaussian process model, the expected improvement criterion and a generic 
branch and bound algorithm which will be used to maximize the expected 
improvement over candidate input points. In Section [3] we propose bounds 
on the expected improvement for estimating contours and simultaneous esti- 
mation of the global maximum and minimum. Section |4] presents simulation 
results comparing our branch and bound and a genetic algorithm similar 



to that of Ranjan et al. (2008) for different test functions. We present our 



conclusions and outline possible future work in Section |5| 



2. Review - surrogate model, global optimization 

2.1. Gaussian Process Model 

Let i/i represent the univariate response from the computer simulator, and 
X = (xi,X2, ...,x„)' be the matrix of input vectors x'^ = {xii,Xi2, ■■■,Xid) used 
to generate the respective outputs, with y(X.) = {yi, y2, yn)'- Without loss 
of generality, let the input space be the unit hypercube x = [0, l]*^- Following 
Sacks et al. (1989) we model the output ?/(xj) as 

(1) 



l,...,n. 



where fi is the overall mean, and {z(x.),x G x} is a Gaussian spatial pro- 
cess with mean 0, variance cr^ and corr(2;(xj), 2;(xj)) = Rij. The correlation 
matrix i? is a function of the hyper-parameters = {6i, . . . ,6d)- In general, 
y{X) has multivariate normal distribution, y{X) ~ A^„(lnyU, cr^R), where the 
parameters Q = (^i, 6'^; /i, cr^) are estimated by maximizing the likelihood. 
The mean and variance parameters have closed form estimates given by 



/i 



-1 



l^)-H'^R-'y 



and 



0", 



[y - lr,fi)'R l{y - InA) 



n 



(2) 
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whereas the hyper-parameters 6 = (^i,...,^^^) are estimated by maximizing 
the profile likelihood. 

The best linear unbiased predictor (BLUP) at x*, y{x*), and the associ- 
ated mean squared error (MSE) are given by 

y{x*) = ix + r'R-\y-lr,ix), (3) 
s'iyix*)) = ol {\ - r'i?-V + , (4) 

where r = {r\{x*\ ...,r„(x*))' and ri(a;*) = corr(z(x*), The estimates 

y{xf ) and s(a;*) are obtained by replacing i?, r and cr^ with -R(^), r(ff) 
and (T?. There are innumerable choices for the correlation structure, for 



instance, power exponential correlation and Matern correlation (see Stein 



(1999); Santner et al. (2003) for details). The choice of correlation function 
is not crucial to the algorithms and methodologies developed here, and thus 
we omit the details. In all illustrations, the power exponential correlation 
function is used. 

It turns out that if the design points are close together in the input space 
the correlation matrix R can sometimes be near-singular which results in 
unstable computation. A popular approach to overcome this problem is to 
introduce a small nugget < 5 < 1 and approximate the ill-conditioned R~^ 
with a well-conditioned [R -|- ^/)^^, where J is the nx n identity matrix. In 
later sections, a nugget is included in the GP model. 



2.2. Sequential design using EI criterion 

For expensive computer models the total number of simulator runs is 
limited; and thus the choice of design points is crucial for estimating certain 
pre-specified features or approximating the underlying process. If we are 
interested in only certain features (e.g., global maxima, contours, and so on) 
of the process, a popular approach is to use a sequential sampling strategy. 
The key steps of such a sampling strategy are summarized as follows: 

1. Choose an initial design {xi, 

2. Run the simulator, /, for these design settings to get {yi, yno}i where 
yi = f{xi). Set n = uq. 

3. Fit a surrogate to the data {{xi,yi),i = 1, ...,n} (we use the Gaussian 
process model). 

4. Choose a new trial x^w thai leads to improvement in the estimate of 
the feature of interest. 
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5. Run the simulator for the new trial location and update the data Xn+i = 

Xnew: Un+l = f{Xnew) and U = Tl -\- 1. 

6. Repeat 3-5 until n exceeds a predetermined limit. 

Developing feature specific criteria for selecting new trials has gained con- 
siderable attention in the computer experiment community. For instance, 



Jones et al. (1998) proposed an expected improvement (EI) criterion for 
estimating the global minimum of an expensive computer simulator. Let 
I{x) denote the improvement function for estimating a particular feature 



of interest. Then, the proposed criterion in Jones et al. (1998) is E[Ii{x)], 
where Ii{x) = max{/mm — yix),0}, fmin is the current best estimate of 
the function minimum, and the expectation is taken over the distribution of 
y{x) ~ N{y{x), s'^{x)). That is, the new trial in Step 4 is the maximizer of 



(5) 



where u = {fmin — y{x))/s{x), and 0(-) and $(■) denote standard normal 
pdf and cdf respectively. One of the most attractive characteristics of the EI 
criterion is that it exhibits a balance between 'global' and 'local' search. The 
first term in ^ supports global search, and the second term, local search. 

Since the main objective of performing a good design is to attain (or at 
least reach a good approximation of) the global minimum in as few simula- 
tor runs as possible, efficient optimization of the EI function becomes very 
important. The EI functions are often multimodal and the location, number 



and heights of these peaks change after adding every new trial. Jones et al. 



(1998) developed a branch and bound (BNB) algorithm for optimizing the 
EI function (|5|, which we review briefly in the next section. 

2.3. Branch and bound algorithm: review for finding a global minimum 
BNB algorithms are often used for global optimization of non-convex 



functions (Lawler and Wood 1966 Moore 1991). The BNB algorithm pre- 
sented here finds the global minimum of a real-valued function g. In our 
case, g{x) = —E[I{x)] is defined on the (i-dimensional hypercube Qinit = 
X = [0, 1]*^ for a specific feature of interest. 

There are three key components of a BNB algorithm - (i) branching (ii) 
bounding and (iii) pruning. The branching component uses a splitting pro- 
cedure that, given a rectangle Q C Qmit, returns Qi and Qu such that 
Qi n Qii = (f) (null) and Qi U Qu = Q. The splitting occurs along the 
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longest edge of Q (if there is a tie, one is chosen randomly). Bounding is 
the second component. Bounding the objective function requires finding 
the lower and upper bounds, "^ih and "^uh, of the minimum value of g. Let 
^mm(Q) = min^gQ5((x) for every Q C Qm, then and must satisfy 

R2 : Ve>035>0 such that for all Q C Qi^^, (6) 

|Q|<5 ^ ■^ub{Q)-'^ib{Q)<t. 



As in Balakrishnan et al. (1991), \Q\ is the length of the longest edge of 



rectangle Q. Although BNB algorithms guarantee the global minimum of 
g with a pre-specified tolerance e, the bounding functions "^n, and '^ub are 
objective function specific and often nontrivial to derive. This is the most 
challenging part of a BNB algorithm. Appropriate lower and upper bounds 
can be constructed for maximizing the EI criteria for a few specific process 
features of interest, and this is the main contribution of the paper. The third 
component of the BNB algorithm is pruning, in which the algorithm removes 
rectangles Q from the set of all rectangles £, that have lower bound greater 
than the upper bound of some other rectangle in £. In the BNB algorithm 
outlined below, individual rectangles are represented by Q's and lists of rect- 
angles are represented by £'s: 

1 A; = (initialize counter) 

2 £o = {Qinit} (initialize rectangle list) 

3 Lo = ^IbiQinit] 

4 f/o = ^ubiQimt} 

5 while Uk — Lj. > e 

6 Pick Q e £fc : -^ibiQ) = Lk 

7 Split Q along one of its longest edges into Qi andQn 

8 := (£A{2}) U {Q/, Qii} (remove Q, replace with 

Qi and Qu to get £fc+i) 



9 L 



k+l 



10 Uk+1 

11 -Cfe+i 

12 k = k + l 

13 end 



minQ^s^^^^^ibiQ) 
minQ^st^^^^i ub{Q) 

^k+i\{Q e 2k+i : ^ife(Q) > Uk+i} 



In this algorithm, k is the iteration index, S^k+i is the list of hyper-rectangles. 
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(Lfc+i, Uk+i) are the smallest lower and upper bounds for min{Qinit) (in our 
case minjjg^ —E[I[x)]) after k + 1 iterations, and e is the desired precision and 
is fixed beforehand. Steps 6-8 correspond to branching, Steps 9-10 bounding, 
and Step 11 represents pruning of the hyper-rectangles. 

The EI function, E[Ii{x)], for finding the global minimum of the simu- 
lator, is a function of the input location x via the predicted response y{x), 
the associated uncertainty s'^{x) at x and the current estimate of the global 



minimum fmin (in general the estimate of the feature of interest). Jones et al. 



( 1998[ ) note that E[Ii{x)] is monotonic with respect to (w.r.t.) y{x) and s{x). 



Let EIi{s{x),y{x)) denote E[Ii{x)] for all x G Taking partial derivatives 
of ([5]) w.r.t. y{x) and s{x) give 

9^ = ^^^") 9^ = -^^"^- 

The partial derivatives dEIi/ds{x) > and dEIi/dy{x) < for all x G x- 
Hence if y{x) and s{x) can be bounded by {yib{Q), yub{Q)) and {sib{Q), Sub{Q)) 
respectively over a hyper-rectangle Q G -C, then for every x G Q, the lower 
and upper bounds E[Ii{x)]ub = ^^wiQ) and E[Ii{x)]ib = "^ubiQ) are 

E[h{x)]ib = Eh{sib{Q),yub{Q)), 
E[h{x)U = Eh{s^,{Q),yi,{Q)), 

where yib{Q),yub{Q), sib{Q) and Sub{Q) are the lower and upper bounds on 
y{x) and s{x) over Q. These bounds are needed in Steps 6-11. In practice 
s{x) is replaced by its predicted value s{x). This approach of bounding the 
EI function is efficient, as bounding s{x) and y{x) is relatively easier (see 



Jones et al. ( 1998 ) for details) 



3. BNB Algorithms for New EI Criteria 

In this section we first propose a generalization of the expected improve- 
ment criterion developed by Jones et al. ( 1998 ) for simultaneously estimating 
the maximum and minimum of an expensive deterministic computer simula- 
tor. Next we develop a BNB algorithm for maximizing this new EI criterion. 



We also propose a modification in the EI criterion developed by Ranjan et al. 



(2008) for estimating a pre-specified contour. This modification facilitates 
the development of a BNB algorithm for maximizing the modified EI crite- 
rion, and still maintains the global versus local search trade-off. 
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3.1. EI criterion for maximum and minimum 

We propose an improvement function for simultaneous estimation of the 
global maximum and minimum. This feature could be of specific interest, 
for instance, if one wishes to identify best-case and worst-case scenarios in a 
global climate change model, or maximal and minimal projected unemploy- 
ment rates in a complex model of the economy. The improvement function 
can be written as 

hi^x) = max{(y(x) - fraax), (frmn " ^(x)), 0}, (7) 

where fmax and fmin are current best estimates of the global maximum and 
minimum respectively. The corresponding expected improvement criterion is 
obtained by taking the expectation of hix) with respect to the distribution 

of y{x) ~ N{y{x),s'^{x)), 

Elhix)] = + {y- fmax)^{Ul) + S(f){u2) + {fmin " y)^{u2), (8) 



where ui = {y - fmax)/s and U2 = {fmin - y)/s. The EI criterion ^ turns 
out to be the sum of the two EI criteria for individually estimating the global 
maximum and the global minimum. Optimization of (|8| favors subsequent 
sampling in the neighbourhood of the global maximum if {y — fmax)^{ui) is 
the dominating term in the sum, the global minimum if {fmin ~ y)^{u2) is 
the dominant term, and otherwise in the less explored regions to minimize 
overall variability. 

As in optimization of the EI criterion (5) for estimating the global min- 
imum, a BNB algorithm can be developed for optimizing ([s]). The most 
difficult part, computation of the lower and upper bounds of ([s]) that sat- 
isfy Rl and R2 in (g for minimizing g{x) = —E[l2{x)], is accomplished by 
monotonicity of the E[l2{x)] w.r.t. y{x) and s{x). Let El2{s{x),y{x)) denote 
E[l2{x)] for all x G x- Then the two partial derivatives of E[l2{x)] are 

1^ = {ui) + (Ma) and |^ = $ {m) - $ (wa) • 

The partial derivative w.r.t. s{x) is positive for all x G and the partial 
derivative w.r.t. y{x) is equal to zero when y(x) = _ Moreover, it 

is straightforward to show that 

{,f max ~\~ fmin) 
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Hence E[l2{x)] is monotonic for all x G x w.r.t. s{x) and piecewise monotonic 
w.r.t. y(x). Given the upper and lower bounds on ^(x) and s(x) in a hyper- 
rectangle Q C Qinit, somewhat conservative bounds on E[l2{x)], for every 
X e Q, are 

E[l2{x)]ib = mm{El2{sib{Q),yib{Q)),El2{sib{Q),yub{Q))}, 
E[l2{x)]ub = max{El2{sub{Q),yib{Q)),El2{sub{Q),yub{Q))}. 

That is, the upper bound of E[l2{x)] in Q is calculated at SubiQ), and due 
to piecewise monotonicity of E[l2{x)], corresponds to the y(x) in Q that is 
closest to fmin OT fmax- Similarly, the lower bound of E[l2{x)] is calculated 
using sib{Q) and the y(x) in Q that is closest to {fmax + fmin)/'^- 

These bounds are conservative because the lower and upper bounds of 
s{x) and minimizer of \y{x) — {fmax + /mm)/2| may not correspond to the 
same point in the input space. The lower bound '^ib{Q) of min.xi^Q{—E[l2{x)]) 
will be equal to the true minimum of — i?[/2(x)] only if the minimizers of s{x) 
and \y{x) — {fmax + /mm)/2| are identical. As a result, the hyper-rectangles 
in £ are pruned at a slower rate. 



3.2. BNB algorithm for contour estimation problem 

The use of a EI criterion for estimating a contour (level set) of an expen- 
sive computer simulator was proposed by Ranjan et al. ( |2008 ). The motivat- 



ing application involved distinguishing "good" from "bad" performance of a 
server in a one-server-two-queue network queuing simulator. The proposed 
improvement function for estimating the contour S{a) = {x : y{x) = a} was 
l3{x) = e^(x) — mm{{y{x) — a)^,e^(x)}, where e{x) = as{x), for a positive 
constant a, defines a neighbourhood around the contour. The expectation 
of I{x) w.r.t. y{x) ~ N{y{x), s^(x)) is given by 

E[h{x)] = [e'{x) -{a- ^(x))^] [$ (m) - $ (n^)] (9) 

U2 



— 2{a — y{x))s{x) [(f) {ui) — (f) {U2)] — s'^{x) j vo^ (j) {w) dw 



where ui = {a — y{x) + e{x))/ s{x), U2 = {a — y{x) — e{x))/ s{x), and and 
$(■) are the standard normal pdf and cdf respectively. [Ranjan et al.| (2008) 
use i?[/3(a;)] as the EI criterion for selecting additional new trials in the se- 
quential sampling strategy. However, controlling the trade-off between local 
and global search has gained attention in the area of computer experiments 
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(e.g., Schonlau, Welch and Jones (1998), Sobester, Leary, and Keane (2005)). 
This motivated us to modify the EI criterion in ^ that will facilitate the 
construction of the lower and upper bounds which satisfy Rl and R2 in (|6|, 
and still entertains the global versus local search trade-off. 

Note that the third term in ^ represents the total uncertainty in the 
e- neighbourhood of the contour at x, and can be dropped without altering 
the important features of the criterion. We propose a modified EI criterion, 
obtained by dropping the third term in (9) and using a change of variable 
{y{x),s{x)} -> {t{x),s{x)}, given by 

E[i;{x)] = s\x) [a^ - t^] [$ (t + a) - $ (t - a)] (10) 
- 2ts^{x) [0 (t + a) - (t - a)] , 

where t = {a — y{x))/ s{x). Although the proposed modification slightly 



alters the tradeoff between the global and local search (see Section 4.1 for 
more details), it allows the partial derivatives of E[I^{x)] to be piecewise 
monotone. Furthermore, this facilitates easy construction of lower and upper 
bounds in the BNB algorithm for efficient optimization of Em{x)]. 

Let EI^{s{x),t{x)) denote Em{x)] for all x E x- Then, the partial 
derivative of Em{x)] w.r.t. t = t{x) is 

dEI* 

= -2s^{x)t[^{t + a)-^{t-a)] + 2as^{x)t[(f){t + a) + (t){t-a)] 
+ s^{x){a^ + - 2)[0(t + a) - (pit - a)]. (11) 

It turns out that the sign of the partial derivative dEI^/dt depends on the 
sign of t = (a — y{x))/s{x). Since the normal pdf is symmetric around the 
mean (i.e., 0('u) = 0(— m), for all u), the partial derivative dEI^/dt = at 
t = 0. In general, 

r <o ift>o 

>0 ift<0 



Due to the presence of normal pdf and cdf in (11), it may not be straight- 
forward to analytically prove the monotonicity result, however, it can easily 
be plotted (see Figure [T]). 

There are a few points worth noting. For instance, the height of these 
peaks, in Figure 1, increases with s. The magnitude and sign of the derivative 
depend on the choice of a (we used a = 2). Interestingly, the nature of the 
curves remain the same for a > 1 (approximately). 
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2 
t(x) = (a-y*(x))/s(x) 



Figure 1: Partial derivative of EI^{t(x), s{x)) w.r.t. t{x) evaluated at a = 2. 



The partial derivative EI^ w.r.t. s{x) is given by 

2s{x){a^ - + a) - <l>(t - a)] 

-Ats{x)[(j){t + a) - (f){t - a)]. 



dEI* 
ds{x) 



(12) 



The partial derivative dEI^/ds{x) is always non-negative. However, as 
with the monotonicity of EI^{s{x),t{x)) with respect to t{x), analytic proof 
may be somewhat complicated. Figure [2] displays the partial derivative 
dEI^/ds{x) as a function of s{x). 

As in the simultaneous estimation of the global maximum and minimum 
case, bounds on Em{x)] for every x E Q are obtained using the bounds on 
t{x) and s{x) in the hyper-rectangle Q E 2. The bounds are 

E[i;{x)]it = mm{Ei;{si,{Q),ti,{Q)),Ei;{sit{Q),tub{Q))}, 
E[i;{x)U = max{Ei;{subiQ),titiQ)),Ei;{s^,{Q),t^,{Q))}. 

Because EI^{s{x),t{x)) is an increasing function of the lower bound of 
E[I^{x)] in Q is obtained using sib{Q), and the value of y{x) farthest from 
the estimated contour. 
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Figure 2: Partial derivative of EI^{t{x), s{x)) w.r.t. s{x) evaluated at a = 2. 



Ranjan et al. (2008 ) used a genetic algorithm for maximizing the expected 



improvement criterion (ph. We will show in Sections 4.2 and 4.3 that the 



proposed BNB algorithm outperforms the implementation of Ranjan et al. 



(2008) and can save simulator evaluations. 



4. Simulations 



This section presents comparison between the proposed branch and bound 
algorithm, a genetic algorithm (GA) and a non-sequential (static) design for 
maximizing the EI criteria for various test functions. Two approaches are 
taken to compare BNB, GA and a static design approach. The first we will 
call the "direct comparison", and was designed to reduce all noise external 
to the two optimization algorithms (BNB and GA) so that the EI criterion 
values chosen by each method can be directly compared. The second, the 
"long run comparison", demonstrates the average performance of the two 
algorithms by comparing the estimated and true features of interest as the 
additional new trials are augmented. In both cases, complex test functions 
are used for generating computer simulator outputs. 

The GA we implemented to maximize the expected improvement function 
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can be described as follows (see 



Ranjan et al. 



(|2008l) for details): 



for i=l . .number_multistarts 

Generate an initial population X_init (of size n_init) 
for j=l . .number_generations 

Mutation - Randomly perturb X_init to get X_pert 
Augmentation: X_aug = [X_init; X_pert] 
Crossover - Randomly swap elements of X_aug to get X_cross 

Augmentation: X_aug = [X_aug; X_cross] 
Selection - Evaluate EI for every candidate solution in 
X_aug and retain n_init members with highest EI value. 

end 
end 

GAs are well-known algorithms that can produce competitive results. In 
the GA we used, the magnitude of the mutation was variable and perturbed 
each dimension ±(0..5)% of the original value. The initial population was 
generated with the random Latin hypercube function Ihsdesign in Matlab 
7.5.0. Crossover was between randomly selected pairs, in random dimensions. 



See Holland (1975) and Mitchell (1996) for more information 



Before we compare the optimization power of the proposed BNB algo- 
rithms and the GA, we present results from an empirical study on the trade- 
off between local versus global search in EI^ and -E/3, for estimating a pre- 
specified contour. 



Comparison between EI3 and EI^ 

Both EI3 and EI^ consist of terms that encourage local and global search. 
The most important difference is the balance between the local and the global 
searches. The entries in Table [T] denote the proportion of additional trials 
that were chosen in the neighborhood of a pre-specified contour, i.e., the 
proportion of new trials that favored local search. Table [T] presents the 
results when the computer simulator outputs were generated from the Branin 
and two-dimensional Levy functions (see Section 4.3 for details on these test 
functions). The results presented here are averaged over 100 realization of 
random maximin Latin hypercube designs of size uq = 20 for initial design. 
For each realization, the new trials were chosen by maximizing the two EI 
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criteria using GA. The contours of interest were y = 45 and y = 70 for 
the Branin and Levy functions respectively. This simulation will display 
whether there are practical differences between the two criteria in terms of 
their balance between local and global search. 



Table 1: Proportion of additional trials chosen in the neighbourhood of a pre- 
specified contour, by EI^ and EI^ criteria for estimating pre-specified contours 



number of 
points added 


Branin y 
Eh 


= 45 Contour 
EI* 


Levy 2D y 
Eh 


= 70 Contour 
Eh 


k = 5 


0.55 


0.66 


0.49 


0.55 


k=10 


0.72 


0.80 


0.44 


0.53 


k = 20 


0.85 


0.89 


0.39 


0.54 


k = 30 


0.89 


0.93 


0.37 


0.55 



A new point was considered to be in the local neighbourhood if f{xnew) 
was within (40, 50) for the Branin function, and within (60, 80) for the Levy 
function. For both functions more points are added in the local neighbour- 
hood with the new criterion, most notably in estimating the Levy contour 
when k = 30 new points were added. 

The practical implications of this simulation are that the new criterion 
allocates slightly more points for local search. This is expected as the term 
with integral was removed from ([9]), and it contributes to the global search. 
This is a desirable characteristic of an expected improvement criterion if the 
underlying surface is relatively smooth. If it is known that the simulator 
response surface is relatively bumpy, the local and global search terms in 



(10) could be reweighted as in Sobester et al. (2005). This reweighted EI will 



still enable easy construction of the lower and upper bounds for the branch 
and bound algorithm. 



4-2. Direct comparison 

The objective of this simulation is to compare BNB and GA optimization 
of the EI criterion for a single step of the sequential design algorithm, starting 
from the same initial design. 

To begin with we follow Steps 1-3 of the sequential design strategy out- 
lined in Section 2.2 to obtain a fitted surface. The fitted CP model serves 
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as input for the BNB and GA optimizer. These two methods search for the 
maximum of the EI criterion over Qinit = [0, l]*^, and then the estimates 
of the maximum EI are compared. The two optimizers BNB and GA use 
exactly the same inputs (MLE of the parameters in GP model, initial trials 
Xinit, y{Xinit), and the current best estimate of the feature of interest) in this 
comparison, and so their output are directly comparable, though subject to 
small approximation. The approximation in BNB comes from estimating the 
bounds of y(x), s(x) based on points in each rectangle Q G Qmit = [0, l]*^. 
For example the upper bound of y(x) in some rectangle Q is estimated by 
the maximum observed ^(x) for x G Q from the GP model fit. Thus, the 
comparison will be between a 'stochastic version' of the branch and bound 
algorithm developed in Section 3, and the genetic algorithm outlined earlier. 
For all the comparisons presented here the number of evaluations of the EI 
criterion will be fixed for both BNB and GA; 500 for 2-dimensional, and 
3000 for 4-dimensional test functions. In actual applications, the use of a 
a tolerance e for BNB might be preferred to the evaluation budget adopted 
here. The evaluation budget facilitates direct comparison. 

The entries in Tables |2] and 3 summarize the EI values found by BNB 
and GA for two features of interest (contour, maximum and minimum) and 
three test functions (Branin, two-dimensional and four-dimensional Levy). 
The numbers are the maximum EI values averaged over approximately 500 
different initial random maximin Latin hypercube designs. A few simulations 
resulted in bad GP fit (especially when fitting the complex Levy functions, see 
Figure 6) and both BNB and GA results for those simulations were excluded 
from average calculation. The numbers in parentheses show the standard 
errors of these EI values {s/y/n = s/v^500). For example, in the first row of 
Table 2, 9.85 and 7.05 are the average maximum EI values obtained by BNB 
and GA, and the second row contains the corresponding sample standard 
error of 0.18 and 0.20 respectively. 

Table |2] shows that for every case (choice of no, feature of interest and test 
function) BNB maximizes E[l2{x)] more effectively than GA. In some cases 
the two methods are closer - for instance with 40 initial points in simultaneous 
estimation of the maximum and minimum for the 2D Levy function the two 
methods are practically indistinguishable. But is most of these simulations 
BNB locates significantly higher EI than GA. 

Analogous to the results presented in Table |2| Table |3] presents the results 
for the four- dimensional Levy function. The features of interest are the 
contour at y = 180 and simultaneous determination of the maximum and 
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Table 2: EI optimization using BNB and GA for simultaneous estimation of the 
global maximum and minimum, and contours for the Branin and 2D Levy functions 



Maximum & minimum 






Branin 


Levy 2D 






BNB 


GA 


BNB 


GA 


no 


= 10 


9.85 


7.05 


3.58 


2.74 






(0.18) 


(0.20) 


(0.23) 


(0.18) 


no 


= 20 


7.46 


4.45 


1.33 


1.13 






(0.17) 


(0.18) 


(0.11) 


(0.10) 


no 


= 30 


6.51 


3.48 


0.76 


0.66 






(0.17) 


(0.16) 


(0.06) 


(0.05) 


no 


= 40 


5.61 


2.75 


0.68 


0.66 






(0.18) 


(0.14) 


(0.05) 


(0.04) 






Contour at y 


= a 








Branin 


(a = 45) 


Levy 2D 


(a = 70) 






BNB 


GA 


BNB 


GA 


no 


= 10 


29.79 


18.12 


62.70 


41.88 






(0.67) 


(0.49) 


(6.19) 


(4.81) 


no 


= 20 


8.02 


4.04 


55.64 


40.23 






(0.24) 


(0.15) 


(5.14) 


(4.27) 


no 


= 30 


3.54 


1.90 


37.98 


24.15 






(0.15) 


(0.09) 


(3.58) 


(2.57) 


no 


= 40 


1.76 


0.93 


29.13 


15.65 






(0.08) 


(0.05) 


(3.19) 


(1.99) 



minimum. Results are presented when BNB and GA are allocated 3000 
evaluations of the EI criterion (Tables [2] and |3] respectively). For the 4D 
Levy function, BNB achieves slightly better results than GA for both of 
these EI criteria. 
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Table 3: EI optimization using BNB and GA for estimation of the y = 180 contour, 
and simultaneous estimation of the maximum and minimum for 4D Levy function 



4 Dimensional Levy Function 





Contour 
BNB 


y=180 
GA 


Maximum and Minimum 
BNB GA 


no 


= 30 


516.1 


509.8 


11.79 


10.63 






(38.39) 


(38.53) 


(0.54) 


(0.47) 


no 


= 40 


325.5 


315.4 


12.46 


10.80 






(27.14) 


(25.27) 


(0.56) 


(0.50) 


no 


= 50 


230.5 


227.2 


8.99 


8.13 






(16.62) 


(15.52) 


(0.38) 


(0.33) 


no 


= 60 


172.1 


170.4 


9.12 


8.35 






(14.90) 


(13.61) 


(0.39) 


(0.35) 



It should be noted that one of the principal advantages of the branch 
and bound algorithm is that it is designed to give results within e of the true 
maximum of E[I{x)]. Fixing the number of EI evaluations (and consequently 
the number of iterations of the BNB/GA optimization algorithm) though 
necessary for comparison, results in an approximation of the EI[x) maximum 
which is not necessarily within e tolerance of the true E[I{x)] maximum. 

4-3. Long run comparison 

The maximum EI values found by BNB and GA cannot be compared 
after the first new point is added because the augmented designs {XinuUXg^^} 
and {Xinit^x^^^} differ, as will the fitted GP surfaces and the EI surfaces. As 
a result, instead of comparing the maximum EI, the running best estimate 
of the features of interest (e.g., maximum) after adding A;-th new design point 
arc compared. We also use a non-sequential (static) approach, by fitting a 
GP model with a random maximin Latin hypercube design of size no + k, 
to serve as a benchmark for comparison. Ideally, both BNB and GA should 
outperform this non-sequential approach. Next, we present three examples 
to illustrate the comparisons. 
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Example 1 - Branin. Suppose the computer model output y = f{xi, X2) 
is generated using Branin function given by 



f{Xi,X2) = X2 



5.1x? 5xi 

H 

47r2 



7T 



6 + 10 1 



Stt 



cos(xi) + 10, (13) 



where xi,X2 G [0, 5], however, we rescale the inputs to [0, 1]^. The true global 
maximum, i/max = 55.6, and minimum, i/min = 0.398, of Branin function are 
attained at (xi,X2) = (0,0) and {xi,X2) = (0.62,0.42). Figure [3] presents the 
long run comparison of the BNB and GA optimizers for maximizing the EI 
criterion ([s]) developed for simultaneous estimation of the global maximum 
and minimum. For each realization, a random maximin Latin hypercube 
designs of size uq = 20 was chosen as initial design, and then nmw = 30 
additional trials were found sequentially (one at-a-time) by maximizing the 
El criterion and augmented to the design. The results are averaged over 100 
such realizations, and the error bars denote the simulation standard error. 




5 10 15 20 25 30 
Number of Points Added to Initial Design 





45 



w 
LU 

» 40 

D 





5 10 15 20 25 30 
Number of Points Added to Initial Design 



(a) Minimum 



(b) Maximum 



Figure 3: Estimated optima for Branin function (true maximum = 55.6, minimum 
= 0.398); BNB (blue - solid), GA (red - dashed), static (black - dash-dotted) 



It is clear from Figure [3] that the optimization of the EI criterion using 
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BNB (blue) leads to the true maximum in significantly fewer number of 
computer simulator runs compared to the (red) GA optimizer or baseline 
(black) search of a random maximin Latin hypercube. However the estimated 
global minimum does not seem to be significantly different for any of the 
optimization techniques, though on average BNB attains a closer estimate of 
the true minimum than GA which is in turn closer on average than the static 
method. This is somewhat intuitive as the Branin function is quite smooth 
and almost flat near the global minimum (see Figure 111) . 




A comparison of a static design of size uq + k and the sequential design 
with EI (10) optimized using BNB and GA when estimating a contour of 
height ?/ = 45 is presented in Figure [5] Their performance is measured in 
terms of the divergence between the true and estimated contour, given by 



^ lib 

- J2^yk{xc,i) - ay, (14) 
i=l 

where {x^i, i = 1, m} denotes the discretized true contour at y = 45, and 
yk{-) denotes the estimated process value after adding k new points or from 
a static design of size no + k. The results are averaged over 100 random 
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maximin Latin hypercube initial designs of size riQ = 20 each, and for each 
reahzation, we add Unew = 30 additional trials. 



12r 




Number of Points Added to Initial Design 

Figure 5: Estimated BNB(blue), GA(red), and static(black) contour divergence 
versus number of new points added to initial design for the Branin test function. 

Figure |5] displays the divergence dk for the Branin function with contour 
a.t y = 45. In this simulation BNB and GA methods perform similarly, both 
significantly better than the static design. The contour discrepancies of the 
sequential methods approach quickly; an indication that this contour is 
relatively simple to locate. Since the contour at y = 45 is in the very bottom 
left corner of the design region (see Figure 5), the maximin Latin hypercube 
design is less likely to place points very near the contour y = 45, which re- 
sults in poor performance for the static method. 

Example 2 - Levy 2D. Suppose the computer model output y = 
f{xi,X2) is generated using the Levy function given by 

d-l 

f{xi, ...,Xd) = sin^(7rwi) + ^(wfc - 1)^[1 + 10sin^(7rwfc + 1)] + {wd - of, 

k=l 

where = 1 + (x^ — l)/4 and Xk G [—10, 10] for k = 1, d. We rescale the 
inputs X = {xi, ...,Xd) to [0, 1]'^. For the two-dimensional Levy function (see 
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Figure [6]), the global maximum, Umax = 95.4, and minimum, ymin = 0, are 
attained at {xi,X2) = (0,0) and {xi,X2) = (0.55,0.55). 

Figure [7] presents the long run comparison of the BNB and GA optimizers 
for maximizing the EI criterion ^ developed for simultaneous estimation of 
the global maximum and minimum. As in Example 1, the results are averaged 
over 100 realizations with initial designs of size = 20 and Unew = 30 
additional trials. For this example, BNB is a clear winner and leads to 
better estimates of the global optima in much fewer simulator runs for both 
the maximum and minimum. As expected, GA performs significantly better 
than the static design. Figure |8] presents the divergence comparison of the 
sequential design with the two optimizers as well as a static design when 
estimating the contour ai y = 70. Here also, each initial design was of size 
uq = 20, Unew = 30 ucw trials were added sequentially, and the results are 
averaged over 100 initial random maximin Latin hypercube designs. As in 
the maximum and minimum case, the sequence of points added by the BNB 
(blue) optimization of EI (10) lead to quicker convergence of the contour 
divergence compared to GA (red). On average the static (black) designs 
provide relatively minimal improvement to the contour estimate. 
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Figure 7: Estimated optima for Levy 2D function (true minimum = 0, maximum 
= 95.4); BNB (blue - solid), GA (red - dashed), static (black - dash-dotted). 
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Figure 8: Estimated contour divergence comparison for the 2D Levy function 
(y = 70); BNB (blue - solid), GA (red - dashed) and static (black - dash-dotted). 
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Example 3 - Levy 4D. We now use an example of a simulator with four- 
dimensional inputs (Levy function) to demonstrate that careful optimization 
of the El criterion can save significant amount of simulator runs. The simu- 
lation results use 30 initial design points chosen from random maximin Latin 
hypercube, and 20 additional points added sequentially. Comparisons using 
this higher dimensional function may highlight the differences between the 
two methods of optimization. BNB and GA were each given a budget of 3000 
evaluations of the EI criterion. The true global maximum is approximately 
255 at (0, 0, 0, 0), and the minimum is at (0.55, 0.55, 0.55, 0.55). 

Figure |9] displays the comparison of our BNB algorithm with the GA and 
static designs in simultaneously estimating the maximum and minimum of 
the 4D Levy function. The results are averaged over 100 random maximin 
Latin hypercube initial designs of size rio = 30. BNB begins to locate both 
the minimum and maximum much more quickly than GA, which is in turn 
better than the static design. BNB is the only method that gets close to the 
maximum in 20 additional trials. All three methods are slow to locate the 
global maximum, a reflection of the complexity of the Levy function. 




5 10 15 20 5 10 15 20 

Number of Points Added to Initial Design Number of Points Added to Initial Design 

(a) Estimated Minimum (b) Estimated Maximum 

Figure 9: Estimated optima for Levy 4D function (true minimum = 0, maximum 
= 255); BNB (blue - solid), GA (red - dashed) and static (black - dash-dotted). 
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As before, we simulate the estimation of the y = 180 contour of the 
4D Levy function for BNB, GA, and static designs. The number of initial 
points no was 30 and 20 new points were added. The results are averaged 
over 100 random maximin Latin hypercube designs. Figure [lO] displays the 
contour divergence for the three designs - sequential with EI (10) optimized 
using BNB, GA and the static design. The new points selected by the BNB 
algorithm allow estimation of the 4D Levy y = 180 contour more accurately 
than the other two methods, decreasing the contour divergence much more 
rapidly after only one point was added. 

132r 

130-7 T 




Number of Points Added to Initial Design 

Figure 10: Estimated contour divergence comparison for the 4D Levy function 
(y = 180); BNB (blue - solid), GA (red - dashed) and static (black - dash-dotted). 

As illustrated in these examples, the branch and bound algorithm is quite 
competitive. For 2D and 4D Levy functions, BNB is superior to GA and 
static in estimation of contours as well as simultaneous estimation of the 
maximum and minimum. In estimating the maximum and minimum of the 
Branin function BNB locates the maximum much more quickly than the 
other methods, and all 3 methods are similar in estimating the minimum. 
For contours of the Branin function GA performs on par with BNB, both 
of which are preferable to a static design. The Branin function is relatively 
smooth, and hence optimization of EI criteria is relatively simple, which may 
explain the absence of differences between these methods in some cases. 
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5. Discussion 

We have generalized the expected improvement criterion for finding the 
maximum and minimum (simultaneously) as the features of interest, and 
demonstrated the use of this infill criterion on the Levy and Branin test 
functions. Also we have developed branch and bound algorithms for contour 
and maximum/minimum estimation, compared them with genetic algorithms 
and static designs, and have shown that the BNB outperforms the other two 
optimization approaches. This study demonstrate that careful optimization 
of the EI criterion can save significant amount of simulator runs. 

Note that we have used a 'stochastic version' of the BNB algorithm, and 
the approximation comes in from estimating the bounds of y(x), s(x) based 
on points in each rectangle Q G Qinit = [0, l]'^. Further work should be done 
to investigate maximization of the proposed EI criteria using a non-stochastic 
branch and bound algorithm. 
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